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I. INTRODUCTION 



A. PURPOSE AND SUMMARY OF RESULTS 

The purpose of this study is to develop the logistic regression alternative for 
estimating attrition rates using length of service and grade as carrier variables. It would 
be most useful if the regression coefficients showed temporal stability and were not 
highly dependent upon the occupational specialty. It is hoped that this development 
can enhance previously developed understanding of the attrition process as it affects 
the United States Marine Corps officer manpower data. 

Unfortunately the logistic regression approach to this problem does not improve 
upon estimators developed by earlier workers. See Table 8 on page 30. It does, 
however, contribute to the understanding of the attrition process as it relates to length 
of service and grade. The partial regression coefficients can serve in ad hoc calculations 
to indicate the direction of change and to make rough estimates of the amount of 
change. These coefficients do, however, change in more than small ways as one cha 
changes the military occupational specialty. See Table 7 on page 24. The aviation 
community especially appears to possess coefficients quite different from those of other 
communities. 

B. BACKGROUND 

The first step in any manpower planning should be a good description of the 
system or organization. Such can allow us to get reasonable forecast values. Forecasts 
should never be interpreted as what will happen but as central estimates of what could 
happen if the assumed trends continue. They therefore provide a guide for management 
action required to achieve a desired objective. Also, good forecast values depend upon 
finding efficient ways to estimate attrition rates. In other words the description of the 
the system, attrition rates and forecasting are each dependent on one another. 

The forecasts made by manpower planning models are affected by three general 
factors; existing inventory, projected losses and projected gains. In order to project the 
inventory into various future time periods it is necessary to forecast the future values 
using a realistic system of flow rates. 

Estimation techniques for the USMC officer attrition rates have been developed 
by Major D.D. Tucker in a thesis [Ref. 1] submitted at the Naval Postgraduate School 
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in September 19S5, and further by Major John R. Robinson in a thesis [Ref. 2] 
submitted at the Naval Postgraduate School in March 1986. They used James-Stein 
and other shrinkage type parameter estimator schemes for the purpose of generating 
stable manpower loss rates. The reader is referred to Tucker [Ref. 1] and Robinson 
[Ref. 2] for most of the background information and the data structure used. By 
necessity, some of that information will be repeated in this paper. 

The United States Marine Corps has about 20,000 officers. These can be cross 
classified into 40 military occupational specialties (MOS), 31 length of service (LOS) 
cells and 10 grades; hence 12400 categories for manpower planning purposes. Also 
about half of these categories are unoccupied for structural reasons. These structural 
zero categories will be described in chapter III. The officer attrition and promotion 
structure was described by Tucker [Ref. 1]. 

One goal of this paper is to examine whether the logistic regression model is an 
efficient way to estimate the attrition rates (i.e. the rate of leaving the service, not of 
changes in MOS, LOS or Grade) for the officer MOS LOS, Grade categories. This 
problem is difficult because of the large number of cells with the low inventory. Tucker 
[Ref. 1] and Robinson [Ref. 2] collected the cells into major groups or aggregates to 
treat this small cell problem; attempts were made to aggregate cells that were believed 
to have common statistical behavior. In the present work we will not collect the cells 
into major groups. Every' MOS will be taken individually. The structural zero cells will 
be dropped before applying the fitting procedure. Namely, structural zero cells will not 
be included in the regression equations. 

There are seven years data available for the present study. The first four years 
(from 1977 to 1980) will be used for model development and logistic regression fitting; 
the last three years (from 1981 to 1983) for validation. 

C. ORGANIZATION 

Chapter II contains the details of the methodology and notation used in the 
present work. A brief summary of the generalized linear regression model is presented 
in this chapter. 

Chapter III explains the logistic regression model structure for the Marine Corps 
data and the validation procedure. A numerical example will be given to illustrate the 
fitting and validation procedures. Also, in this chapter we will compare Figures of merit 
with Robinson's [Ref. 2] results. 

Chapter IV thoroughly discusses the results and recommendations. 
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Appendix A includes the APL functions for the data manipulation, the logistic 
regression and the validation of the model. 

Appendix B illustrates the logistic probability plots of residuals and the plots of 
the residuals vs. fitted values for selected cases. 
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II. METHOD OF ESTIMATION 



A. INTRODUCTION 

A major use of regression models is prediction. Thus, given data on a response 
variable y and associated predictor variables x ; (i = 1 to p), the aim of the regression is 
to find a function of the x/s which is, in some sense a good predicator of y. It is 
assumed throughout that the x/s at which future predictions are required are not 
specified in advance but will occur randomly over some population of values and that 
the success of prediction can be judged by its performance over such a population. 

Logistic regression is a member of the class of generalized linear models. An 
overview of the linear model is briefly discussed in the following section. All of the 
approach and background for the logistic regression model was taken from Pregibon's 
[Ref. 3] paper. 

B. AN OVERVIEW OF THE LINEAR REGRESSION MODEL 

Linear regression is used to relate a response variable v. to one or several 
explanatory or descriptive variables Xjj through a set of linear equations of the form 

>’i = Po + Pj x ij + e i 1 = 1 ’••••“ 

The Vj (for i = 1 to n) are the n observed values of the response variable, the x^ (for i 
= 1 to n) are the n values of the j th explanatory variable (for j = 1 to p), and the 
parameters pj are the unknown regression coefficients. The a are the random "errors" 
or fluctuations. The variables x-j and w are sometimes called "independent" and 
"dependent" variables. 

The linear equation above can be simplified by defining an extra variable xj Q 
whose value is always 1 (x i0 = 1), so the model with constant term can be written as, 

y ; = I^ 0 Pj x ij + £ i i = • n 

Usually the £ ; are assumed to be statistically independent of each other with zero 
means and with a constant variance that does not depend on i or x~. 

In regression we usually want to estimate the regression coefficients from the 
data, either because we want to know and interpret the coefficients themselves, or 
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because we will use them to predict future values of y.. Upon replacing (3. by their 
estimated values Pj, we obtain the fitted (or "predicted") values y., 

A v-'P n . 

v. = > p.x.. i = l,...,n 

'* r=o * ij 

The residuals £. are defined as the differences between the observed and the fitted 

i 

values. 

e ; = \\ - v. i = 1 n 

The residual are used in many diagnostic displays because they contain most of 
the information regarding lack of fit of the model to the data. In terms of fitted and 
residuals, we have 

data = fit + residual 



which in mathematical notation is expressed as 



v. 

J 1 



- Jfl,* 



/ p.x.. 

r=o > ,j 



A 

8 . 



i = l,...,n 



In matrix notation the least-squares estimate p can be found as follows, 
f - t- - II }• • XPII 2 - (y - xft) T ( y - XP) 

y A A 

where c is the vector of residuals , £“ is the square length of residuals and y = XP is 
the vector of fitted values. When we do some algebra, the equation becomes 

cp = y T y- 2y T Xp + p T X T Xp 

A * 

If we take the derivative of (p, subject to P and set the dtp/^P equal to 0, then the least- 

A 

squares estimate P is obtained by solving this normal equation 

X T y - X T Xp = 0 

The solution of the linear system is 

P = (X T X)-»X T y 

which is sensitive to poorly fit observations and extreme design points. 

Presently, there is a fairly large battery of diagnotics available for detecting which 

A 

observations exert undue influence on p. The two basic quantities that are most useful 
for this purpose are the residuals, 8j = v- - Xjp, and the projection matrix 
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M = I - H = I - X(X T X) _1 X T 

where H is called hat matrix. Essentially, the vector c describes the deviation of the 
observed data from the fit, and M the subspace in which £ lies 

As a bottom line, the residual vector c is important for the detection of ill-fitting 
points, but will not adequately point to observations which unduly influence the fit. In 
particular, large residuals are seldom associated with high-leverage points, whereas 
small residuals (which usually pass our inspection unnoticed) are typically of the 
opposite character. 

C. BACKGROUND AND NOTATION FOR THE LOGISTIC REGRESSION 

1. General 

A maximum likelihood fit of a regression model is extremely sensitive to 
outlying responses and extreme points in the design space. 

Classically, logistic regression models were fitted to data obtained under 
experimental conditions, for example, bioassay and related dose-response applications. 
The current use of logistic regression methods includes the analysis of data obtained in 
observational studies. In contrast to controlled experimentation, data from such 
studies can be notoriously "bad" both from the point of view of outlying responses (y), 
and from the point of view of extreme points in the design space (X). The usual 
method of fitting logistic regression models, maximum likelihood, has good optimality 
properties in ideal settings, but is extremely sensitive to "bad" data of the above types. 

In particular, good data analysis for the logistic regression models need not be 
expensive or time consuming. 

2. Unstructured case 

Consider a single binomial response y ~ B(n,p). If we let 0 = logit(p) = 
log{p.(l - p)}, the probability function of y can be written as 

f(y; 0) = exp{y0 - a(0) + b(y)} y = 0,1,. ...n 

0 

with a(0) = n log(l + e ), b(y) = log (y) and where throughout this paper log(.) = 
log e (.). Up to an arbitrary constant, the logarithm of f(y; 0), 

1(0; y) = y0 - a(0) + b(y) 

is the loglikelihood function of 0. The score and information functions are given by, 
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, Q N <31(0; y) - /Qn 
s(0; = ~^e — = y ■ a = >’ • n P 

-<3s(0; v) 

v(0; y) = — = 3(0) = np(l - p) 

where "a" with k dots above it denotes (5^ / 50^)a(0). Standard results yield E(s(0; y)} 

A 

= np = a(0) and Var(y) = np(l - p) = a(0). Also, since s{0; y) = 0 at the maximum 

A A . 

likelihood estimate (m.l.e) of 0, we have 0 = a *(y) = logit(y n) as the rn.l.e. of 0 
based on a single binomial observation y. 

Given a sample of N independent binomial responses y ; ~ B(n i ,p j ). The 
loglikelihood function for the sample is the sum of individual loglikelihood 
contributions: 

1(0; y) = £*1(0.; V;) = f (>-0; - a(0.) + b(y )) 

ts=i 1 1 f^T 1 1 1 1 

3. The logistic regression model 

The likelihood function 1(0; y) is over-specified. There are as many parameters 

as observations. Given a set of m explanatory variables (X^X,, ,X m ), the logistic 

regression model utilizies the relationship 

0 = logit(p) = XP 

as the description of the systematic component of the response y. In terms of the m 
dimensional paramater P, we have the loglikelihood function, 

1(X; P) = Kx;P; >’;) = S' y T i -xP - a ( x iP) + b (>’i) 

1=1 1=1 

The m.l.e. maximizes the above equation and is a solution (assumed unique) to 

A 

(d cip) l(XP; y) = 0. In particular, P satisfies the system of equations: 

fi x ij( y r a ( x iP)) = ° i =1 m 

s\ 

Writing s = y - a(Xp) = y - np, the formulation of the likelihood equations is 
X T s = X T (y - f) = 0 
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where y = up and T denotes the transpose. These equations, although very similar to 

/N 

their normal theory counterparts, are nonlinear in P and iterative methods are required 
to solve them. Typically, when second derivatives are easy to compute (in the 
•(d c'P)X r s = X r V X with V = diagonal(a( XjP)}), the Newton-Raphson method is 
employed . This leads to the iterative scheme 



pt + i = pt + (X T vX) - 1 X T s 



where both V and s are evaluated at p l . At convergence (t = u). we take P = p u , and 
denote the fitted values n. p. by v.. The estimated variance of v. is v.. = n p.(l - n.). 

A most useful way to view the iterative process outlined above is by the 
method of iteratively reweighted least-squares (IRLS). This is obtained by employing 
pseudo observation vector z* = Xp* + V — 1 s, for which the above equation becomes 



P t+1 = (X T VX)~ l X J Yz l 



■a I 

At convergence, we have z = XP + V 's. Thus we may write the maximum 

A 

likelihood estimator of P as 

p = (X T VX) -1 X T Vz 



4. Output from a maximum likelihood fit 

Once the model has been fitted (that is, we have the m.l.e. P), various 
quantities from the fitting process are available for the data analysis. Typically, these 
quantities consist of subsets of the following: 

1. the estimated parameter vector, P ; 

-a 

2. the individual coefficient standard errors, s.e.(P 

3. the estimated covariance matrix of p, var(p) = (X 'VX) 1 ; 

4. the chi-squared goodness of fit statistic = S s i 2,/v ii ’ 

5. the individual components of x~’ namely X; = s ^- s / v ii = (}’j - n-p|)/Vnjp|( 1 - Pj); 

A A A 

6. the deviance D = -2{1(XP ; y) - 1(0 ; y)}, where 1(0 ; y) refers to the maximum 
of the loelikelihood Tunction based on fitting each point exactly, i.e., 0. = 
logit(y./n.7. 

Asymptotic arguments suggest that the deviance and chi-squared statistics 
have the same limiting null x (N - rn) distribution, and hence some measure of the 
appropriateness of the fitted model. 
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D. THE BASIC BUILDING BLOCKS OF REGRESSION DIAGNOSTICS 



1. Preliminaries 

After fitting a logistic regression model, and prior to drawing inferences from 
it, the natural succeeding step is that of critically assessing the fit. In practice however, 
this assessment is rarely considered and seldom carried out. The basic reasons are 

1. the lack of routine methods for performing such an analysis, and 

2. the presumably high cost of doing so. 

The role of a regression diagnostician is to provide routine methods of model 
sensitivity analysis which are both intuitively appealing and inexpensive. Clearly this 
requires a thorough understanding of the model and the nature of the fitting process. 

2. The basic building blocks 

For the logistic regression model, the basic building blocks for the 
identification of outlying influential points will again be the residual vector and a 
projection matrix. For the linear model, residuals are rather uniquely defined (apart 
from standardization), whereas for the logistic regression model, residuals can be 
defined on several (at least three) scales. The two most useful are the components of 
chi-square, given above in (e), and the components of deviance, D = V dj 2 

dj = ± v / 2(l(0 i ;y i) - 1( ; y-)} 1 '' 2 , 

A /\ A A 

where the plus or minus is used according as Qj > x ; P or Qj < x ; p. Note that d- is 
defined for all values of Vj even though 0j may not be. In particular, v = 0, d 2 = -2n 
log(l-p) and at y = n, d 2 = -2n log(p). Both x~ and D are the measures of the 
goodness-of-fit of the model. 

The analog of the projection matrix for the logistic model will also be denoted 
by M, which in its general form is given as 

M = I - H = I - V 1/2 X(X T VX)“ 1 X T V 1 ' 2 

The usefulness of M arises as a consequence of the IRLS formulation described earlier. 
In particular, as P = (X T VX) -1 X T Vz, the vector of pseudo-residuals is given by 

z - xp = (I - X(X T VX) “ ! X T V}z = V" 1 ' 2 MV 1 ' 2 z 

using the fact that z = XP + V ] s, this can be written as V L s =V 12 MV 1 2 s 
Premultiplication by the diagonal matrix V 1 2 yields X = Mx, where x = V 1 2 s 
Thus, as in the linear model case, M is symmetric, idempotent and spans the residual 
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(X) space. This suggests that small rm which are the diagonal elements of the 
projection matrix M should be useful in detecting extreme points in the design space. 

In most cases, the examination of X;. d ; and m- will call attention to outlying 
and influential points. In some cases, combinations of these (for example, studentized 
residuals) will also be useful. For displaying these quantities, index plots are generally 
(and. if the order of the observations is important strongly) suggested: that is, plots of 
X; vs i, dj vs i and m ;i vs i. In particular cases, plots of these building blocks against the 
fitted values could prove useful. 
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III. MODEL BUILDING WITH USMC MANPOWER DATA 



A. GENERAL 

Robinson [Ref. 2] explains the conversion of the raw data to an APL workspace. A 
brief explanation about the conversion is given in Appendix A. The summary data file 
classifies the Marine Corps officer inventory into 40 military occupational specialties, 
10 grade levels, 31 length of service and 8 loss categories. In the present study we are 
not dealing with the type of loss. These were described by Tucker [Ref. 1] For use in 
our model we need to define grades and military occupational specialties (MOS) by 
Table 1 and 2. When reference is made to a particular grade or group of grades the 
code number from Table 1 used instead of the name of the grade. For example this 
project will refer to the grades first lieutenant, captain and major as numbers 5, 6 and 7 
respectively. Tucker and Robinson used data code numbers for the MOS instead of 
the actual MOS. For example, this project will refer to the Air traffic control MOS as 
number 37 not 73. It should also be understood that the two digit MOS identifier listed 
in Table 2 is strictly the military occupational specialty identifier in the USMC MOS 
manual. We will also use the code number from Table 2 for the MOS. The column 
containing the letters A through E, refer to the structural zero categories. 



TABLE 1 
GRADES 



CODE 



GRADE 



0 

1 

2 

3 

4 

5 

6 

7 

8 
9 




CHIEF WARRANT OFFICER ( CWO-4 

SECOND LIEUTENANT 

FIRST LIEUTENANT 

CAPTAIN 

MAJOR 

LIEUTENANT COLONEL 
COLONEL 



IS 





TABLE 2 

MILITARY OCCUPATIONAL SPECIALTIES (MOS) 


DATA 


CODE 


MOS 


CAT 


MOS TITLE 


00 


UN 


A 


UNKNOWN 


01 


01 


A 


PERSONNEL AND ADMINISTRATION 


02 


02 


A 


INTELLIGENCE 


03 


03 


C 


INFANT ARY 


04 


04 


A 


LOGISTICS 


05 


08 


A 


FIELD ARTILLERY 


06 


11 


D 


UTILITIES 


07 


13 


A 


ENGINEER, CONSTRUCTION AND EQUIPMENT 


08 


14 


D 


DRAFTING, SURVEYING AND MAPING 


09 


15 


D 


PRINTING AND REPRODUCTION 


10 


18 


C 


TANK AND AMPHIBIAN TRACTOR 


11 


21 


A 


ORDNANCE 


12 


23 


B 


AMMUNITION AND EXPLOSIVE ORDNANCE 
DISPOSAL 


13 


25 


A 


OPERATIONAL COMMUNICATIONS 


14 


26 


A 


SIGNALS INTELLIGENCE/GROUND ELECTRONIC 
WARFARE 


15 


28 


B 


DATA/COMMUNICATIONS MAINTENANCE 


16 


30 


A 


SUPPLY ADMINISTRATION AND OPERATIONS 


17 


31 


A 


TRANSPORTATION 


18 


33 


A 


FOOD SERVICE 


19 


34 


A 


AUDITING, FINANCE AND ACCOUNTING 


20 


35 


A 


MOTOR TRANSPORT 


21 


40 


A 


DATA SYSTEMS 


22 


41 


B 


MARINE CORPS EXCHANGE 


23 


43 


A 


PUBLIC AFFAIRS 


24 


44 


A 


LEGAL SERVICES 


25 


46 


A 


TRAINING AND AUDIOVISUAL SUPPORT 


26 


55 


B 


BAND 


27 


57 


D 


NUCLEAR, BIOLOGICAL AND CHEMICAL 


28 


58 


A 


MILITARY POLICE AND CORRECTIONS 


29 


59 


B 


ELECTRONICS MAINTENANCE 


30 


60 


A 


60 XX 


31 


61 


A 


AIRCRAFT MAINTENANCE 


32 


63 


B 


AVIONICS 


33 


65 


B 


AVIATION ORDNANCE 


34 


68 


B 


WEATHER SERVICE 


35 


70 


D 


AIRFIELD SERVICES 


36 


72 


A 


AIR CONTROL, AIR SUPPORT AND ANTI -AIR 
WARFARE 


37 


73 


A 


AIR TRAFFIC CONTROL 


38 


75 


C 


PILOTS AND NAVAL FLIGHT OFFICERS 


39 


99 


E 


IDENTIFYING MOS AND REPORTING MOS 



A structural zero is a cell whose inventory is always zero because certain grades 
and length of service combinations should never appear in that military occupational 
specialty (MOS). For example a Colonel with 5 years of service in any MOS or an 
inventory warrant officer in MOS 03 does not exist. The effect of these structural zero 
categories is summarized in Table 3. 
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TABLE 3 

STRUCTURAL ZEROES CATEGORIES 











Stru. 


Totol 




Grades 


Number 


Zeroes 


Zeroes 


Category 


within MOS 


of MOS 


per MOS 


per Cat. 


A 


WOl. . . 


LTCOL 


23 


129 


2967 


B 


WOl. . . 


CW04, LDO 


8 


159 


1272 


C 


2LT. . . 


LTCOL 


3 


202 


606 


D 


WOl. . . 


CW04 


5 


237 


1185 


E 


WOl. . . 


COL 


1 


119 


119 


TOTAL 






40 




6149 



B. HOW TO BUILD THE LOGISTIC REGRESSION MODEL WITH USMC 
DATA 

1. Introduction 

The purpose of this study is to develop the logistic regression model for 
estimating USMC officer attrition rates using length of service (LOS) and grade (GR) 
as carrier variables. The logistic regression model for the estimation of USMC officer 
attrition rates can be formulated 

9 = logit(p) = + P 2 (LOS) + P 3 (GR) 

In matrix notation, this can be written as 



e = xp 

where X is N'xm matrix, also called the design space and p is the mxl matrix, also 
called the coefficients of the regression. Then, it can be said that 0 = logit(p) is a Xxl 
matrix. 

2. How to create the design space 

Each MOS is taken individualy for the estimation of officer attrition rates. 
Every MOS has dimension 31x10 for 31 LOSs and 10 grades. Each LOS and grade 
must be broken into segments and each segment is a seperate regression. As an 
example, any MOS can be broken into four segments as in Table 4. Each segment has 
its own X matrix. Each design space (X) has dimension Nxm where N stand for the 
number of independent binomial responses and m stand for the number of explantory 
variables, which is always three in our case. This X matrix can be written 
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CNT LOS GR 
X 11 X 12 X 1 3 

X 21 X 22 X 23 



X 



(Xx3) 



x Nl x N2 x X3 



where CXT means constant which is the first column of the X and always one. 





TABLE 4 




SEGMEXTS 


LOS 


GRADE 


18<LOS<30 
8<LOS<19 
4<LOS< 10 
0<LOS< 5 


8 , 9 (LTCOL, COL) 

5 , 6 , 7 , 8 (FIRST LT, CAPT, MAJ, LTCOL) 
4,5,6 (SECOND LT, FIRST LT,CPT) 

1,2, 3, 4 ( CWO-2 , CWO-3 , CWO-4, SECOND LT) 



C. A NUMERICAL EXAMPLE FROM THE USMC DATA 

As an illustration of the standard output from a maximum likelihood fit and the 
use of the logistic regression model, we will use the case where military occupational 
specialty (MOS) = 20 (motor transport, from Table 2), length of service (LOS) = 
from 5 to 19 years and grades = 4, 5, 6, 7 (second lieutenant, first lieutenant, captain 
and major, from Table 1). The data are listed in Table 5. They are obtained using the 
APL data manipulation functions described in detail in Appendix A. 

In Table 5, the structural zero inventory 7 cells are dropped before applying the 
fitting procedure. The output listed in Table 6, is obtained using the APL logistic 
regression functions in Appendix A. We get the estimated coefficients of regression as 
follows, 

Pj = 0.548539 
P, = -0.17092 
P 3 = -0.20117 
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TABLE 5 










DATA 












CENTRAL 




X 




LOSS 


INVENTORY 


CNT 


LOS 


GR 


Y; 


n i 


1 


5 


4 


0 


4. 5 


1 


5 


5 


6 


33. 5 


1 


5 


6 


1 


3 


1 


6 


4 


0 


2. 5 


1 


6 


5 


5 


19 


1 


6 


6 


0 


9 


1 


7 


4 


0 


1. 5 


1 


7 


5 


1 


5. 5 


1 


7 


6 


2 


13 


1 


8 


5 


3 


3 


1 


8 


6 


1 


14 


1 


9 


4 


0 


1 


1 


9 


5 


3 


4. 5 


1 


9 


6 


2 


12. 5 


1 


10 


4 


0 


1 


1 


10 


5 


0 


3. 5 


1 


10 


6 


0 


12 


1 


10 


7 


0 


0. 5 


1 


11 


4 


0 


0. 5 


1 


11 


5 


1 


7 


1 


11 


6 


0 


5 


1 


11 


7 


0 


3 


1 


12 


5 


0 


7 


1 


12 


6 


1 


5 


1 


12 


7 


0 


4 


1 


13 


5 


0 


10. 5 


1 


13 


6 


0 


4. 5 


1 


13 


7 


0 


3 


1 


14 


5 


0 


10 


1 


14 


6 


1 


7 


1 


14 


7 


0 


4 



The deviance for the fit, 46.5863 on 28 degrees of freedom, and the corresponding 
chi-squared statistic is 46.4579. Both are less than their asymptotic expectation of 28, 
indicating no gross inadequacies with the model. In table 6, y vj is the individual 
component of y 2 , d ; is the component of deviance and m ;i is the diagonal element of of 
projection matrix M. The examination of y., dj and rm calls attention to outlying and 
influental points. The individual components of '/} and of the deviance (d ; ) are plotted 
against the logistic probability plot in Figure 3.1. Evidently, two observations, the 10 th 
and 13 th are not well fit by the model; their y ; and deviance (residuals) deviate from the 
straight line configuration of the others. Also, fitted values are plotted against the 
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TABLE 6 












OUTPUT 






logit( y i /n i 


) 0; 


X i 


d. 

I 


m.. 

ii 


1 


— 


-1. 1108 


-1. 2172 


1. 6005 


0. 8124 


2 


-1. 5224 


-1. 3120 


-0. 4678 


-0. 6487 


0. 5298 


3 


-0. 6931 


-1. 5131 


0. 6884 


1. 2806 


0. 9213 


4 


- 


-1. 2817 


-0. 8329 


1. 1066 


0. 9054 


5 


-1. 0296 


-1. 4829 


0. 8775 


0. 9521 


0. 8210 


6 


- 


-1. 6841 


-1. 2924 


1. 7506 


0. 8388 


7 


- 


-1. 4526 


-0. 5923 


0. 7941 


0. 9459 


8 


-1. 5040 


-1. 6538 


0. 1356 


0. 5473 


0. 9595 


9 


-1. 7047 


-1. 8540 


0. 1957 


0. 5482 


0. 8368 


10 


- 


-1. 8247 


4. 3133 


3. 4417 


0. 9784 


11 


-2. 5649 


-2. 0259 


-0. 5256 


-1. 0382 


0. 8666 


12 


- 


-1. 7945 


-0. 4076 


0. 5545 


0. 9633 


13 


-0. 6931 


-1. 9957 


3. 5753 


2. 3189 


0. 9625 


14 


-1. 6582 


-2. 1969 


0. 7066 


1. 0379 


0. 8973 


15 


- 


-1. 9654 


-0. 3742 


0. 5120 


0. 9620 


16 


- 


-2. 1666 


-0. 6332 


0. 8713 


0. 9640 


17 


- 


-2. 3678 


-1. 0602 


1. 4660 


0. 9027 


18 


- 


-2. 5690 


-0. 1957 


0. 2716 


0. 9891 


19 


- 


-2. 1364 


-0. 2429 


0. 3340 


0. 9802 


20 


-1. 7917 


-2. 3375 


0. 5116 


1. 0448 


0. 9114 


21 


- 


-2. 5387 


-0. 6283 


0. 8717 


0. 9561 


22 


- 


-2. 7399 


-0. 4401 


0. 6127 


0. 9429 


23 


- 


-2. 5085 


-0. 7548 


1. 0466 


0. 8942 


24 


-1. 3862 


-2. 7096 


1. 2719 


1. 6268 


0. 9507 


25 


- 


-2. 9108 


-0. 4665 


0. 6511 


0. 9309 


26 


- 


-2. 6794 


-0. 8487 


1. 1804 


0. 8171 


27 


- 


-2. 8806 


-0. 5024 


0. 7008 


0. 9499 


28 


- 


-3. 0817 


-0. 3709 


0. 5187 


0. 9515 


29 


- 


-2. 8503 


-0. 7604 


1. 0603 


0. 8054 


30 


-1. 7917 


-3. 0515 


1. 2450 


1. 5873 


0. 9133 


31 




-3. 2527 


-0. 3932 


0. 5509 


0. 9381 



components of the deviance and the components of the y} in Figure 3.2. For 
displaying the combinations of X;, d ; and m-, index plots (i.e. y. vs i, d vs i and m ;i vs 
i) are showed in Figure 3.3. 

Also, we selected some cases to examine whether the coefficients of regression 
have temporal stability or not. The estimated coefficients of regression are listed by 
Table 7 for the selected cases. 
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TABLE 7 






COEFFICIENTS OF REGRESSION FOR SOME CASES 


MOS = 


3 ( INFANTRY) 


Pi 


P2 


P 3 


0<LOS< 6 
3<LOS< 9 
9 < LOS < 19 
19 < LOS <29 


AND 4 < GR < 6 
AND 4<GR<6 
AND 5<GR<8 
AND 7<GR<9 


-5. 786 
-2. 029 
4. 714 
-1. 376 


0. 037 
-0. 212 
0. 047 
0. 191 


0. 764 
0. 245 
-1. 389 
-0. 609 


MOS = 


7 (ENGINEER, 


CONSTRUCTION 


AND EQUIPMENT) 






Pi 


P 2 


P3 


0<LOS< 6 
3<LOS< 9 
9 < LOS < 19 
19 <LOS<29 


AND 4<GR<6 
AND 4<GR<6 
AND 5<GR<8 
AND 7<GR<9 


-5. 900 
-1. 758 
3. 846 
0. 021 


0. 037 
-0. 129 
-0. 160 
0. 150 


0. 827 
0. 129 
-0. 845 
-0. 639 


MOS = 


13 (OPERATIONAL COMMUNICATION) 








Pi 


P 2 


P 3 


O^LOS< 6 
3<LOS< 9 
9 < LOS < 19 
19 <LOS<29 


AND 4<GR<6 
AND 4<GR<6 
AND 5<GR<8 
AND 7<GR<9 


-5. 995 
-1. 188 
3. 366 
-0. 783 


0. 038 
-0. 186 
-0. 117 
0. 178 


0. 884 
0. 281 
-0. 776 
-0. 614. 


MOS = 


20 (MOTOR TRANSPORT) 










Pi 


P2 


P 3 


0<LOS< 6 
3<LOS< 9 
9 < LOS < 19 
19 < LOS <29 


AND 4<GR<6 
AND 4<GR<6 
AND 5 < GR < 8 
AND 7<GR<9 


-7. 406 
-4. 438 
1. 866 
-0. 440 


-0. 089 
-0. 066 
-0. 315 
0. 009 


1. 2 49 
0. 646 
-0. 135 
-0. 101 


MOS = 


38 (PILOTS AND NAVAL FLIGHT OFFICERS) 






Pi 


P2 


P 3 


0<LOS< 6 
3<LOS< 9 
9 < LOS <19 
19 <LOS<29 


AND 4<GR<6 
AND 4<GR<6 
AND 5<GR<8 
AND 7<GR<9 


-10. 1922 
-10. 8841 
2. 1225 
0. 3936 


-0. 0404 
-0. 1476 
-0. 1663 
0. 2257 


1. 5493 
1. 7560 
-0. 4317 
-0. 8984 
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D. VALIDATION OF MODEL 



A validation test was conducted to evaluate the efTiciencv of the logistic 
regression model for the estimation of the L'S.MC officer attrition rates. The test was 
conducted as follows: 

1. Select the LOS's and crades within a military occupational specialty. The 
resulting desired array will be three dimensional (years. LOS, grades) 

2. Let "i" stand for LOS. then i = 0 30 

3. Let stand for GR, then j = 0,...,9 

4. Let v.. = number of leavers in cell (i.i) 

5. Let m = central inventor.' in (i.j) = max ((N(t) + N(t+ 1 ))/ 2, Y(t )} 

6. Let t = 1 T where T = number of Years (i.e from 1977 to 1983) of data used 

to create the estimator 

The validation procedure used t = 1,...,4 (i.e. from 1977 to 19S0) for the fitting 
and t = 5.6,7 (i.e. from 1981 to 1983) for validation. 

The following procedures were utilized to validate the effectiveness of the logistic 
regression estimation process. We define an indicator variable 

1 p.. = 0 or 1 

D.. = if 

■j 

0 p.. * 0 or 1 

1 >j 

Then 

K = TV Djj for all i and j 

where K is the number of nonstructural zeroes cells. Then validation test can be 
formulated as chi-square goodness of statistic test as follows 

(Pii - Pij) 2 

Chi-square MOE = ^y D- m for all i and j 

Pijd - Pij) 

Where p ; . is found from the fitting using the estimator years, pjj (= v/n) can be 
obtained from the validation and the central inventory which comes from the 
validation years. For our numerical example, (MOS = 3, LOS = 5 through 14 and 
GR = 4, 5, 6, 7) we get the following validation test results for the years 1981, 1982 and 
19S3 specifically MOE; are 52.6998, 36.4182 and 30.6585 respectively. 
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Figure 3.1 Probability plots of X, and d ; . 
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Figure 3.2 Plots of fitted values vs and d ; . 
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Figure 3.3 Index plots of Xp dj and m^. 



28 



E. COMPARISON OF THE FIGURES OF MERIT 

In this section, we will compare the figures of merit with Major Robinson's 
[Ref. 2] results. As we mentioned before, he used the limited translation shrinkage 
estimation (LTSE) for the estimation of USMC officer attrition rates. We have been 
using a different estimation method for the same manpower data. Also, he used 
procedure which we explained in the above section to validate the effectiveness of the 
limited translation shrinkage estimaton. In order to compare the figures of merit of 
logistic regression and the shrinkage estimation, we present some results for some cases 
in Tables S and 9. 

If we look at the tables we can see that shrinkage estimation looks better than 
logistic regression estimation for most of the selected cases. We can't say that limited 
translation shrinkage estimation is much better than logistic regression. The results are 
very close to each other for some cases, even though, logistic regression is sometimes 
better than shrinkage estimation (i.e. for case MOS = 20, 3 ^ LOS ^ 9 and 
4<GR<6). 
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TABLE 8 








FIGURES OF MERIT 






( 0 < LOS < 6 ) AND ( 4 < GR < 6 ) 




MOS = 3 ( INFANTRY) 


1981 


1982 


1983 


LTSE 

REGRESSION 


27. 8528 
59. 8577 


42. 4799 
88. 5361 


45. 9140 
86. 6193 


MOS = 7 (ENGINEER, 


CONSTRUCTION AND 


EQUIPMENT) 




LTSE 

REGRESSION 


13. 2892 
35. 3195 


18. 8664 
31. 3636 


20. 7735 
27. 6810 


MOS = 13 (OPERATIONAL COMMUNICATIONS) 




LTSE 

REGRESSION 


22. 4989 
41. 7272 


16. 1496 
31. 5084 


13. 5038 
30. 6847 


MOS = 20 (MOTOR TRANSPORT) 






LTSE 

REGRESSION 


15. 9591 
24. 4329 


34. 4740 
28. 3449 


17. 8570 
22. 5246 


( 3 < LOS < 9 ) AND ( 4 < GR < 6 ) 




MOS = 3 ( INFANTRY) 


1981 


1982 


1983 


LTSE 

REGRESSION 


19. 1602 
73. 0644 


67. 2562 
89. 0204 


34. 1118 
61. 9981 


MOS = 7 (ENGINEER, 


CONSTRUCTION AND 


EQUIPMENT) 




LTSE 

REGRESSION 


20. 5515 
60. 5127 


19. 8988 
40. 1607 


18. 2333 
26. 2687 


MOS = 13 (OPERATIONAL COMMUNICATIONS) 




LTSE 

REGRESSION 


20. 3665 
28. 6348 


15. 3913 
25. 9982 


17. 6670 
32. 2280 


MOS = 20 (MOTOR TRANSPORT) 






LTSE 

REGRESSION 


22. 3545 
26. 1725 


52. 2840 
31. 6402 


35. 5580 
19. 7830 
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TABLE 8 

FIGURES OF MERIT (CONT'D.) 



(9<LOS<19) AND (5<GR<8) 



MOS = 3 (INFANTRY) 




1981 




1982 


1983 


LTSE 

REGRESSION 




84. 5388 
149. 5783 




70. 3422 
61. 7802 


40. 2220 

41. 9882 


MOS = 7 (ENGINEER, 


CONSTRUCTION 


AND 


EQUIPMENT) 




LTSE 

REGRESSION 




42. 4237 
84. 4140 




22. 9296 
48. 6112 


17. 3584 
24. 7120 


MOS = 13 (OPERATIONAL 


COMMUNICATIONS) 




LTSE 

REGRESSION 




48. 3150 
108. 1312 




25. 9520 
41. 2197 


26. 6658 
37. 5635 


MOS =20 (MOTOR TRANSPORT) 








LTSE 

REGRESSION 




20. 5629 
41. 8773 




24. 6164 
44. 0796 


16. 2029 
33. 7604 


(19<LOS<29) AND 


( 7 < GR < 9 ) 




MOS = 3 ( INFANTRY) 




1981 




1982 


1983 


LTSE 

REGRESSION 




30. 0620 
46. 3861 




18. 9604 
28. 9819 


29. 1716 
32. 3470 


MOS = 7 ( ENGINEER, 


CONSTRUCTION 


AND 


EQUIPMENT) 




LTSE 

REGRESSION 




21. 8423 
28. 3865 




25. 2194 
33. 0140 


34. 9758 

35. 8610 


MOS = 13 (OPERATIONAL 


COMMUNICATIONS) 




LTSE 

REGRESSION 




46. 9617 
77. 5956 




20. 6439 
36. 2923 


10. 8807 
21. 5748 


MOS =20 (MOTOR TRANSPORT) 








LTSE 

REGRESSION 




12. 5150 
23. 2035 




15. 5716 
27. 9930 


12. 9169 
31. 8230 



IV. CONCLUSIONS AND RECOMMENDATIONS 



A. CONCLUSIONS 

Recall that the logistic function and its inverse can be expressed as 

Q 0 

0 = In {p (1-p)} and p = e / (1 + e ) 

Further, it is useful to record , 

0 0 

dp.'dG = e / ( I + e ) 2 

Identifying p as the attrition rate, we can use a limited Taylor approximate the change 
in rates. Thus, 



Ap = p(l -p)' 0 2 ALOS + p 3 AGR} 



provides us with a linear approximation to the direction and amount of change. 

Although the logistic regression approach does not improve upon the attrition 
rate estimators developed by Tucker [Ref. 1] and Robinson [Ref. 2] it does point to the 
direction of change as one varies LOS and GR. To this end, it was necessary to 

partition the 30 year LOS range into segments. It is an exercise in curiosity to 

speculate as to the reasons for observed behavior in these segments. Here is our 
offering 

1. 0 < LOS < 5; attrition rates are chaotic as young officers "test the waters". 

2. 3 ^ LOS ^ 9; attrition rates decline with increasing LOS as officers commit 

themselves to longer second and third contracts." One would think that 
advancement in grade would also correlate with a lower rate, but we don't see 
that in Table 8 also there are other kinds of shifts influencing the attrition 
behavior in these years. 

3. 9 < LOS ^ 19; the maturing carrier commitment has been made and rates 
decline with increasing LOS and GR. 

4. 19^ LOS ^ 30 ; since advancement opportunities of the senior officer are 
limited we see rates increasing with LOS and decreasing with advances in 

B. RECOMMENDATIONS 

The linear approximation to the effect of change could be most useful if we could 
group the MOS categories into sets of common regression coefficients and if these 
coefficients were stable over time. To pursue each of these contingencies requires 




32 



additional work and an expanded data base. The programs developed in this thesis 
serve as a foundation for extension. 
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APPENDIX A 
APL FUNCTIONS 



1. GENERAL 

This appendix contains APL functions for the data manipulation, logistic 
regression and the validation of the model. The original data is on a magnetic tape 
named COUNTS prepared by Navy Personel Resarch and Development Center 
(NPRDC). Robinson [Ref. 2] explained the conversion of raw data from tape to an 
APL workspace. In order to get the LOSSXX (Losses) and INVXX (Inventories) 
arrays, the procedure should be followed in the order presented by Robinson. "XX" is 
the applicable fiscal year. (e.g. 77 for fiscal year 1977) 

2. DATA MANIPULATION FUNCTIONS 

Some APL functions were developed by Tucker and Robinson for the data 
manipulation and exucution of calculations pertaining to the processes under 
evaluation. These functions will be summarized in the following section. We will use 
some of them in this project. They are GETINV, INVMATX, GETLOSS and 
MATRIX. Also, two more APL functions were utilized for the manipulation of the 
data in order to use the logistic regression and validation. 

a. Creating the inventory' and loss arrays 

Using the INVXX arrays and the APL function GETINV in Figure A.l and 
INVMATX in Figure A.2 create the array IXX. Note that GETINV calls INVMATX 
and INVMATX uses the INVXX arrays. APL workspace size limitations may be a 
problem due to the large amount of data. It may be necessary 7 to create one or two 
arrays at one time and copy them to another workspace. 

The LXX arrays are created in a manner similar to the above, using the APL 
functions GETLOSS in Figure A. 3 and MATRIX in Figure A. 4 APL function 
MATRIX uses the loss arrays LOSSXX. The resulting matrices are "IXX" and "LXX" 
for fiscal year "XX". The function "INVMATX" and "MATRIX" could create a matrix 
of the following dimension 7x40x10x31 for 7 years, 40 MOS's, 10 grades and 31 LOS's. 
However, due to limited workspace, the dimension of 40x31x10 for 40 MOS's 31 LOS's 
and 10 Grades was commonly utilized. 
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V GETINV 

[ 1 ] a THIS FUNCTION CALLS THE FUNCTION INVMATX 
[ 2 ] a FOR EACH FISCAL YEAR . IXX IS THE INVENTORY 

[3] p AtftfTLY FOR FISCAL YEAR XX BY OF / LOS / GRADE . 

[4] I77+INVMATX INV77 

[5] I78+INVMATX INV78 

[6] I79+INVMATX INV79 

[7] I80+INVMATX INV80 

[8] I81+INVMATX INV81 

[9] I82+INVMATX INV82 

[10] I83+INVMATX INV83 

[11] ' SHAPE OF 177 IS 1 

[12] t>p!7 7 

V 



Figure A.l APL Function GETINV. 
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V Z+INVMATX X:A;B-,C;D-,EiF: I :J 
A CREATES THE INVENTORY ARRAYS FOR THE FISCAL 
A YEARS USING THE ARRAYS OF INDEXES INVXX. 
p INVXX MUST BE A CHARACTER VECTOR OF 9 DATA 
p ENTRIES FOLLOWED BY 1 BLANK FOR EACH LOOP. 
Z«-(40 31 10 )p 0 
I«- pX 

J+(I+ 1)t10 
LOOPi + (j - 0 )/OUT 

P A<r$> ( 1 f X) 

S-«-l + ( $ ( 2 + X-<- 
C-«-l + (<t> (l+X^ 

D - <-l + (^(2 + X'<-._ 

E+& ( 3 +X<- ( 2 +X ) ) 

Z[B;fl:C]+E 
X«-(4+X) 

J^7-l 
+L00P 

OUT : ' FINISHED -- SHAPE OF MATRIX IS 




Figure A. 2 APL Function INVMATX. 

b. Manipulation of the data for regression and validation 

The function GETCENTNV in Figure A. 5 creates the central inventory which 
assigned CIXX for the fiscal years from 1977 to 1983. The function GETCENTNV uses 
the global variables of "IXX" and "LXX" for the inventory and loss matrices 
respectively, for fiscal year "XX". 
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V GETLOSS 


ci: 


A THIS FUNCTION CALLS MATRIX FOR EACH FISCAL 


c 2 ; 


a YEAR . LXX IS THE LOSS ARRAY FOR FISCAL YEAR 


c 3 : 


a XX BY OF /LOS /GRADE. 


C 4 ] 


L7 7 ^MATRIX L0SS11 


C 5] 


L7 8 ^MATRIX LOSS! 8 


C 6 ] 


L7 9 ^MATRIX LOSS 7 9 


C 7 ] 


L80+MATRIX LOSS 8 0 


C8] 


a LS1+MATRIX LOSS 81 


C 9 ] 


a L8 2+MATRIX LOSS 82 


CIO] a L83+MATRIX LOSS83 
V 



Figure A. 3 APL Function GETLOSS. 
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V Z+MATRIX X;A;B;C;D;E:F: I\J 
A THIS FUNCTION CREATES THE LOSS ARRAY FOR THE FISCAL 
A YEARS USING THE ARRAY OF LOSS INDICES LOSSXX. IT IS 
a CALLED BY GETLOSS . LOSSXX MUST BE A CHARACTER VECTOR 
a WITH 9 DATA ENTRIES FOLLOWED BY 1 BLANK FOR EACH LOOP . 
Z^(40 31 10 )p0 
I-e-pX 

J+(I+1 )*10 
LOOP:+(j=0 ) /OUT 
A+<1>( 1+X) 

B -*- 1 + ( £ (2 + X-<- ( 1 + X ) ) ) 

c^i + ($(i +X"<- c 2 +x 5 5 5 
P-<-l + ( $ ( 2 f x^ ( l+x ) ) 5 
E«-a(l+X«-( 2 +X) ) 

F«-*( 2 +X«-<l+X)) 

ZCB;P;C]*ZCB,fl;C] +F 
X+( 3 +X) 

J+J-l 



+L00P 

OUT : ' FINISHED - - SHAPE OF MATRIX IS ' 



p i 



Figure A. 4 APL Function MATRIX. 

The function GETDATA in Figure A. 6 manipulates the data for regression 
and validation procudures. The outputs; IEST and LEST are the sum of CIXX and 
LXX respectively where "XX" is the fiscal years 1977 to 19S0, i.e. the first 4 years are 
used for the estimation. "IVALXX" and "LVALXX" are the CIXX and LXX 
respectively where "XX" here is the fiscal years from 1981 to 1983, i.e. the last three 
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V GETCENINV 

A GET THE CENNTRAL INVENTORY DATA FOR 
a THE FISCAL YEAR FROM 1977 TO 1983 
CI77<-( (T77+T78 j + 2 L77 

CJ78^( (178+179 )42 )r£78 
CT7 9«-( (17 9+18 0 H2 )r£79 
CT8 0 + C (J8 0+181 H2 )r L8 0 
Cl 8 1-*- ( C J81+J82 3 + 2 )r L81 
CT8 2-KI8 2+I8 3 ) + 2 )r£82 
CT83«-I83rz;83 
7 



Figure A. 5 APL Function GETCENINV. 



V GETMIM 

1 ] A AMWIPf/LAZ’E TO USE IN REGGRESSION 

:2J A AND VALIDATION PROCUDURES 
[3] IEST+CI7 7 +CI7 8 +CJ7 9+CJ 8 0 
:4] LEST<rL77 +£7 8 +£79 +£8 0 
!5] IT4£81+CJ81 

!6] iT4£8 2<H7l8 2 

7] IV AL%3*-CI82 
[8] £W1£81+£81 

79] £lM£82-«-£82 

;i0] £T4£83*£83 

V 



Figure A. 6 APL Function GETDATA. 

years are used for the validation procudure. The function GETDATA uses the global 
variables CIXX and LXX for the central inventory matrix and loss matrix for fiscal 
year "XX". 

c. Why the central inventory? 

A problem arises on several occasions when the data is disaggregated to a 
level for which the inventory is very' small. For example, when examining the inventory 
in a particular fiscal year, the inventory can be zero for a length of service (LOS) and 
military occupational specialty (MOS) combination. Examining the inventory in the 
next fiscal year for the same LOS and MOS combination may also be zero. The 
problem arises when the number of leavers is equal to or greater than one. 



37 





V LOGISTIC 
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A THIS IS THE MAIN FUNCTION FOR THE REGRESSION DIAGNOSTICS 




'21 


a AND THE VALIDATION . THIS FUNCTION CALLS THE FUNCTIONS 
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a FITTED, RESIDUAL AND VALIDATION WHICH THEY ALL MUST BE 
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a IN THE SAME APL WORKSPACE . 




: 5 i 


FITTED 




!6] 


RESIDUAL 




'71 


VALIDATION 




:si 


UPP+ 8 




: 9 i 


' WOULD YOU LIKE TO SEE RES , FITTED VALUES AND BET AH AT ' 




: i o ; 


»0 :NO 1 :YES ' 




: i i ; 


KK + □ 




112! 


■^{KK - 0 )/L 14 




; 1 3 ' 


'BET AH AT IS ' 




'14' 


BETA 




: 1 5 ' 


' VECTOR OF FITTED VALUES ' 




: is ] 


TETHAT 




[17] 


' VECTOR OF COMPONENTS OF DEVIANCE IS ' 
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' CHI-SQUARE TEST STATISTIC IS ' , *CHI 
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£14 : » WOULD YOU LIKE TO SEE THE VALIDATION RESULTS ' 




[24] 


>0 :NO 1 : YES 1 




[25] 
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[27] 


' CH I -SQUARE MOE FOR THE VALIDATION ' 
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• 1981 1982 1983 ' 




[29] 


CHISQ 




[30] 


' DEGREES OF FREEDOM IS ' , *DEF 
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[32] 


£15 : ' WOULD YOU LIKE TO RUN FOR ANOTHER CASE ' 




[33] 


’ 0 :N0 1 : YES' 




[34] 


TT+U 




[35] 


->( TT-0 )/0 




[36] 


LOGISTIC 
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Figure A. 7 Apl Function LOGISTIC. 

This can occurs because the inventory figures refer to the instant beginning of 
the fiscal year, and the loss figures refer to any time during the year. I.e. an ofFicer can 
both access and attrite from it any time during the year. Then p (= y/n) would be 
ambigous where y is the leavers and n is the inventory at time t. 

For the purpose of removing this ambiguity from the data, the following 
policy was adopted to define the central inventory number for the officer force at 
disaggregated levels for any cells or collection of cells. 

1. Let t = 1 6, refer to the year 1977 19S2 

2. Let Y(t) = Number of losses in year t 

3. Let INV(t) = Inventory in the beginning of year t 
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V FITTED 

fi IS FUNCTION IS FOR THE CALCULATION OF THE 
fl COEFFICIENS , FITTED VALUES OF THE LOGISTIC 
fl REGRESSION . 

' M<9S ' 

MOS^D 
' ENTER LOS ' 

LOS^D 
' ENTER GR ' 

IWI^ISSIIK 
ISSSl^ISSK 
X<-p ( .IN VI ) 

X<r§((3 ,K)p(Kpl) 

Xl+X 
EP+1E 8 
N1+(K,1 
Yl+CK.l 
J+(( ,N 1 
Xl+J/Xl 
Nl+J/Nl 
Yl+J/Yl 

BETA*- ((l+(pXl)) 

L2 :BETA1+BETA 



1+mS): (1+L0S): (1+Gfl)] 
(l+MOS); (1+LOS); (1+G/?)] 



(,$<<(pGfl), (pLOS))pLOS)) , (KpGR)) 



<p(,INV 1) 
p( t LOSS 1) 
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Dpo 



S<-I1 -/VI xPffAZV ( ( *TETHAT ) * ( 1 + ( ) ) ) 

Vl+(Nlx (*TETHAT ) )* ( (1+(*TETHAT) )*2 ) 

AM-py+,71 

y+(((wjv) P y))x{iw)o. = ( t tf) 

BSM-e-BEIA + C ( (i( ( (S)X1 ) + .x7) + .xXl) ) + .x(!S)Xl ))+.xS) 
R*-+/ (BETA-BETA1 ) 

+L2*\EP< \R-p .BETA 
TETHAT+X1+. xBETA 
I+(iN)°.=(\N) 

B«-(( (7*0.5 ) + .xXl) + .x(S((($Xl) + .xy)+.xXl))) 

MW- ( (£ + . x (iS)Xl ) ) + . x (y*o . 5 ) ) 

MD*-+/U(\N)° . = (\N))xMl) 

V 



Figure A.S Apl function FITTED. 



4. Let N(t) = Maximum of Y(t) and the average inventorv using the beginning 
inventory in year t and t+1 and computing their avarage (INV(t) 
+ INV(t + l))/2. N(t) is the central inventorv of year t. This will "provide the 
elements for a more accurate estimation of "the attrition rate on the 
disaggregated level. 



3. LOGISTIC REGGRESSION AND VALIDATION FUNCTIONS 

The following APL functions were utilized for the logistic regression and the 
validation of the model. These functions must be in the same APL workspace. Also, 
they use the global variables; IEST, LEST, IVALS1, IVAL82, IVALS3. LVALS1, 
LVAL82 and VALS3 which are the output of the function GETDATA. 
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V RESIDUAL 

r THIS FUNCTION IS FOR THE CALCULATION OF THE 
R RESIDUAL VECTORS OF THE REGRESSION . 
ff«-(.Yl*0)A((,Yl)*,JVl) 

NH+H/ ,N 1 
YH+H/ ,Y 1 
Pl+YHiNH 



TETHAt-®{Pl*{l-Pl)) 

TH+H L i TETHAT 
DEV-*- 2 x (TH-TETHA ) 
DEVl+{pTETHAT)pH\DEV 
U<r,Y 1 = 0 
NU+U/ ,N1 
PHA£U+U / ,P HAT 
Alt- 2*NUx(.®Mt-(l-PHATU )) 
Al«-(pZ , EZ , ffAZ , )pff\Al 
DE72-«-Z)£71+7ll 



Z«-( ,Y1 )= , A/1 
NZt-Z/ t Nl 
PHAZZ+Z/ ,PH AT 
A2+ 2*NZ*{®PHATZ) 
A2*-(pTETHAT)pZ\A2 
DEV+DEV2+A2 
£)«-+/( |Z)E7) 

Cl«-D£7<0 

C2«-PE’72iO 

Dff7t-(C2-Cl)x(( DE7)*0.5) 
TETAt- ( p TETHAT ) pH\TETHA 
VAR+Nl xPHAT* ( 1 -Ptf AT ) 
CHI++H {S*2 )*VAR ) 
CHICOM+Si {VAR* 0.5) 

V 



Figure A. 9 Apl Function RESIDUAL. 

a. Function LOGISTIC 

APL function LOGISTIC in Figure A. 7 is the main function for the regression 
and validation calculations. This function calls FITTED, RESIDUAL and the 
VALIDATION functions. These functions cannot be run alone. They must be run by 
the function LOGISTIC. In other words, they are just the subfunctions of the main 
function LOGISTIC. These subfunctions will be discussed following. 

b. Function FITTED 

APL function FITTED in Figure A.S finds the fitted values of the regression. 
This function uses global variables "IEST" and "LEST". 

c. Function RESIDUAL 

APL function RESIDUAL in Figure A. 9 calculates the array of the residuals. 
This function is just the continuation of the function FITTED, 
filesect Function VALIDATION 
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V VALIDATION 

A THIS FUNCTION IS FOR THE CALCULATION OF THE 
A CHI-SQUARE ST AT. ( CHISQ ) FOR THE FISCAL YEARS 
A FROM 1981 TO 1983 . 

CHISQ* 3p0 

1*1 

INV2*IVAL81l(l+MOS)\(l+LOS'){(l+GR)l 

LOSS2*LVAL8lL(l+MOS)‘, (1 +LOS); (1+CP)] 

*£10 

£4 : INV2*IVAL82 [ (1+MOS ) ; (1 +£0S); (1+CP)] 
£OSS2*£7A£82[(l+mS); (1+£0S); (l+CP)] 

*£10 

£5:iW2*I7A£83[(l+M<9S); (1+£<9S); (l+CP)] 
LOSS2*LVAL83l(l+MOS)', (1 +LOS); (1+GR)1 
£10 :P1*( , 11772*0 ) 

NT1*T1/ ,INV2 
YT1*T1/ .LOSS2 
P*YTlrNTl 
P*(.K.l)pTl\P 
N2*(K , 1 ) p ( ,INV2 ) 

P#AP1*(K,1 ' ' 

D*(PHAT 1X0 
?£*+/£ 

r#is3 [£]*+/(( (ppapi 
i*j+i 



pJ\(.PPAP) 
A(p#An*i ) 



-P)*2 )x/V2x£)+ (PPAPlx (l-PMTl ) ) 



] ;ff= 



= 2 ) / £4 
3 )/£5 



DEF*N - 3 
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Figure A. 10 Apl Function VALIDATION. 

APL function VALIDATION in Figure A. 10 calculates the Chi-Square 
statistics for the fiscal years from 1981 to 1983. This function uses global variables 
IVALXX and LVALXX where "XX" are the fiscal years from 1981 to 1983. 
d. Description of the output variables 

In this section, we will describe the output variables which are used in the 
APL functions. 

BETA : vector of the regression coefficients 

TETHA : vector of logit(p) where p = y/n 

TETHAT : vector of fitted values 

DEV : vector of components of the deviance 

CHICOM : vector of individual components of/ 2 

MD : vector of diagonal elements of projection matrix 

CHI : the chi-squared goodness of fit statistic for estimation years 

D : total deviance 
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CHISQ 

DEF 



: the vector of chi-squared test statistic for validation years 
: degrees of freedom 
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APPENDIX B 
GRAPHS 



This appendix contains graphical illustration of the fitting for the estimation of 
L'SMC officer attrition rates. Some cases were selected from the USMC manpower 
data to illustrate whether logistic regression model lit well the data or not. Each case 
has its own regression. From Figure B.l through the Figure B.8, for each case, 
following plots are showed. 

1. logistic probability plot of components of the deviance 

2. logistic probability plot of components of the chi-square 

3. scatter plot of fitted values vs components of the deviance 

-4. scatter plot of fitted values vs components of the chi-square 
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MOS = 3, (KLOS^6, 4^GR^6 

LOGISTIC PROB PLOT OF COMPONENTS OF DEVIANCE LOGISTIC PROB PLOT OF COMPONENTS OF CHI-SQUARE 




Figure B.l Illustration of fitting for MOS 



3, LOS = 0-6, GR = 4-6. 
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FITTED VALUES FITTED VALUES 




Figure B.2 Illustration of Fitting for MOS 



7, LOS = 0-6, GR = 4-6. 
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Figure B.3 Illustration of fitting for MOS 



13, LOS = 0-6, GR = 4-6. 
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MOS = 20, 0£L0S<^6, 4^GR^6 

LOGISTIC PRO0 PLOT OF COMPONENTS OF DEVIANCE LOGISTIC PROB PLOT OF COMPONENTS OF CHI-SQUARE 




Figure B.4 Illustration of* fitting for MOS 



20, LOS = 0-6, GR = 4-6. 
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FITTED VALUES FITTED VALUES 



MOS = 3, 19^LOS^29, 7^GR^9 

LOGISTIC PROB PLOT OF COMPONENTS OF DEVIANCE LOGISTIC PROB PLOT OF COMPONENTS OF CHI-SQUARE 





Figure B.5 Illustration of fitting for MOS = 3, LOS 



19-29, GR = 7-9. 
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FfTTED VALUES FITTED VALUES 
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Figure B.6 Illustration of fitting for MOS = 7, LOS 



19-29, GR = 7-9. 
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Figure B.7 Illustration of fitting for MOS = 13, LOS 



19-29, GR = 7-9. 
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MOL> = 2U, iy^LOb^2y, 7^UK^9 

LOGISTIC PROB PLOT OF COMPONENTS OF DEVIANCE LOGISTIC PROB PLOT OF COMPONENTS OF CHI-SQUARE 





Figure B.8 Illustration of fitting for MOS = 20, LOS 



19-29, GR = 7-9. 
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